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ABSTRACT 

The observational limitations of astronomical surveys lead to significant sta- 
tistical inference challenges. One such challenge is the estimation of luminosity 
functions given redshift (z) and absolute magnitude (M) measurements from an 
irregularly truncated sample of objects. This is a bivariate density estimation 
problem; we develop here a statistically rigorous method which (1) does not as- 
sume a strict parametric form for the bivariate density; (2) does not assume 
independence between redshift and absolute magnitude (and hence allows evo- 
lution of the luminosity function with redshift); (3) does not require dividing 
the data into arbitrary bins; and (4) naturally incorporates a varying selection 
function. We accomplish this by decomposing the bivariate density 4>(z, M) via 

log <f>{z, M) = f(z) + g(M) + h(z, M, 9) 

where f and g are estimated nonparametrically, and h takes an assumed paramet- 
ric form. There is a simple way of estimating the integrated mean squared error 
of the estimator; smoothing parameters are selected to minimize this quantity. 
Results are presented from the analysis of a sample of quasars. 

Subject headings: truncation bias, luminosity function, statistical procedures, 
quasars 



Introduction 



Astronomers commonly seek to estimate t he space density o f objects, and a sky survey 
such as the Sloan Digital Sky Survey (SDSS) (jYork et al.l 120001 ) can yield a representative 
sample useful for this purpose, due to the assumed isotropy of the Universe. Figure [1] 
depicts redshift and absolute magnitude measurements for a sample of quasars given in 
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Richards et al.l (120061 ). These are a subset of the SDSS quasar sample (Data Release 3), 
chosen to be statistically valid for purposes such as exploring the evolution with redshift of 
the luminosity function, i.e. the space density of quasars as a function of absolute magnitude. 
This paper describes a new method for estimating these luminosity functions, and presents 
results from the analysis of this quasar sample. 

For the purposes of the statistical inference problem, imagine the dots in Figure [1] 
as observations of bivariate data {(zj, Mj) : i = 1,2, ... ,n} from some distribution with 
probability density <p(z,M), i.e. the probability that a randomly chosen quasar falls in 
a region B is j B <ft(z, M)dz dM. ( Equivalent ly, in a sample of size n, one expects that 
n J B <j)(z, M)dz dM will fall in the region B.) Hence, the luminosity function at redshift z 
is, up to a multiplicative constant, the cross-section of the bivariate density at z, denoted 
<f>[z,*). 

The main challenge is estimation of this bivariate density given truncated data. Only 
objects with apparent magnitude within some range are observable. When this bound on 
apparent magnitude is transformed into a bound on absolute magnitude^, the truncation 
bound takes an irregular shape, varying with redshift. i^-corrections further complicate this 
boundary, leading to the dashed region in Figure HJ Also, the sample is not assumed to 
be complete within this region, and the probability of observing an object will vary with 
position on the sky, along with other factors. Incorporating this selection function into the 
analysis is a secondary challenge. 

Nonparametric estimators are advantageous in cases where either there does not ex- 
ist a commonly agree d upon parametric physi cal model, or there is a desire to validate a 



parametric model. See IWasserman et al.l (120011 ) for an overview of the potential of nonpara- 



metric methods in astronomy and cosmology. A fully nonparametric approach is not possible 
here, since some assumptions must be placed on the form of the density in order to infer 
its shape over the unobservable region. Under such conditions, one approach would be to 
fit a sequence of increasingly complex parametric models in an attempt to obtain a good 
fit to the data. A less subjective alternative is a semiparametric approach which merges 
a nonparametric method with sufficient structure from a parametric form to obtain useful 
results. This work describes a semiparametric approach to estimating the bivariate density, 
and hence the luminosity functions, under irregular truncation. 

This is a long-standing challenge in astronomical data analysis, with a variety of pro- 
posed methods. Interesting qualitative and simulations-based comparisons between different 



1 Here, a flat cosmology with = 0.7, fi m = 0.3, H Q = 70kms 1 Mpc 1 is assumed when making this 
transformation. 
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approaches can be found in IWillmerl (119971 ) and iTakeuchi et al.l (120001 ). A parametric model 
fit using maximum likelihood is a com mon choice, since it a ddresses the t runca tion bias 
in a natural manner; see, for instance, ISandaee et al.l (1979). iBovle et al.l (120001 ) and the 



parametric models fit and referenced in §6 of iRichards et al.l (120061 ). These models have the 
drawback of imposing a tight constraint on the luminosity function in a case where there is 
not a consensus parametric form. 

Some proposed methods are nonparametric, but assume that redshift and absolute 
magnitude are independent, and hence assume that there is no evolution of the luminos- 
ity function with redshift . The se in clude the nonpa rametric maximum likelihood method 



described in iLvnden-Be 



the l/Knax e stimator of 



i ( Il97ll ) and IJacksonl (11974) and adap t ed fo r double truncation i n 



Efron fc Petrosian ( 1999 ) , along with th e met hods in Efstathiou ( 1988 ). Choloniewski (jl986 ) 



Schmidtl (119681 ) and 



Felten Th e semi parametric method of 



Wangl (119891 ) also assumes independence. iMaloney fc Petrosian! ( 119991 ) apply a nonparamet- 
ric technique which assumes independence after having transformed the bivariate data using 
a parametric form. Any method which assumes indepe nden ce can be applied over s mall red- 
shift ranges (usually called bins) . iNicoll fc Segall (119831 ) and IPage fc Carreral (120001 ) describe 
other binning approaches. Binning forces the difficult choices of bin centers and widths, and 
independence is still assumed over the width of the bin. 

This work was motivated by the goal of developing a statistically rigorous method which 
(1) does not assume a strict parametric form for the bivariate density; (2) does not assume 
independence between redshift and absolute magnitude; (3) does not require dividing the 
data into arbitrary bins; and (4) naturally incorporates a varying selection function. This 
was accomplished by decomposing the bivariate density <j>(z, M) into 

log 4>(z, M) = f(z) + g(M) + h(z, M, 0) (1) 

where h(z, M, 9) will take an assumed parametric form; it is intended to model the de- 
pendence between the two random variables. For example, there may be a physical, para- 
metric model for the evolution of the luminosity function which could be incorporated into 
h(z,M,6). Alternatively, one could use h(z,M,6) = QzM as a first-order approximation 
to the dependence. The functions f and g are estimated nonparametrically, with bandwidth 
parameters to control the amount of smoothness in the estimate. Using the quasar sample 
of Figure [Q, t he estimates ob t ained here are quite consistent, if not a bit smoother, than 
those found in lRichards et al.l (120061 ) . This analysis confirms the finding of the flattening of 
the slope of the luminosity function at higher redshift. 

The paper is organized as follows. §2] briefly describes the quasar sample used here. 
§|3] gives an overview of the idea of local maximum likelihood, a nonparametric extension of 
maximum likelihood, and describes in detail the semiparametric approach taken here. §H 
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describes how the integrated mean squared error can be approximated using cross-validation; 
the bandwidths ca n then be chosen to mi nimize this quantity. §5] presents some results from 
the analysis of the iRichards et al.l (120061 ) quasar sample, along with the results from some 
simulations. More detailed deri yations, along w ith theory for approximating the distribution 
of the estimator, can be found in lSchaferl (120061 ). The approach was implemented as a Fortran 
subroutine with R wrappeJE 



2. Data 



The full IRichards et al.l (120061 ) sample, shown in Figure [TJ consists of 15,343 quasars. 
From these, any quasar is removed if it has z > 5.3, z < 0.1, M > —23.075, or M < —30.7. 
In addition, for quasars of redshift less than 3.0, only those with apparent magnitude between 
15.0 and 19.1, inclusive, (after the application of .fT-corrections) are kept; for quasars of red- 
shift greater than or equal to 3.0, only those with apparent magnitude between 15.0 and 20.2 
are retained. These boundaries combine to create the irregular shape shown by the dashed 



l ine in Figure [TJ This truncation removes two groups of quasars from the IRichards et al. 



(120061 ) sample. First, there are 62 quasars removed with M > —23.075. This was done 
to mitigate the effect of the irregularly-shaped, very narrow region in the lower left corner 
of Figure [TJ Second, there are 224 additional quasars with z < 3 and apparent magnitude 
larger than 19.1; these fall in an extremely poorly sampled region, which can also be noted 
from Figure [TJ Hence there are 15,057 quasars remaining after this truncation. 

The sample is not assumed to be complete within this region. Associated with each sam- 
pled quasar is a value for the selection function, which can be interpreted as the probability 
that a quasar at this location, and of these characteristics would be captured by the sample. 
Details regarding how the selection function was approxi mated via simulations , along with 
many other details regarding the sample, can be found in IRichards et al.l (120061 ). 



2 It is available for download, along with documentation, from 

http : //www . stat . emu . edu/ ^cschaf er/BivTrunc 
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3. The Model 

The approach taken here is built upon a nonparametric extension of maximum likelihood 
called local likelihood modeling. This section begins by describing local likelihood density 
estimation in the general case. This is then adapted to the problem at hand, initially for the 
case assuming the random variables are independent. The case where dependence is allowed 
is then described as a simple extension. 



3.1. Local Likelihood Density Estimation 

To contrast the standard global approach to estimation with the local approach em- 
ployed here, consider the following. Assume the data X = (Xl,X 2 , . . . ,X n ) are realizations 
(observations) of independent, identically distributed random variables from a distribution 
with density f . With classic maximum likelihood estimation, one chooses a single estimate 
from among a class of candidates for / ; let T denote this class. Specifically, the maximum 
likelihood estimator (/ MLE ) for /„ is defined as the / G T which maximizes 

n 

or, equivalently, the / G T which maximizes 

n „ 

£log/(X,-)-n f(x)dx. (3) 

3=1 

(The notation Xj simultaneously indicates a random variable with unknown density f , and 
the observed realization of that random variable.) Although written here like a density 
estimation problem, one could imagine the class T being indexed by a parameter 6; hence 
this also captures the usual maximum likelihood estimator for parametric problems. For 
example, one could define T to consist of all Gaussian densities as mean /x and variance a 2 
vary. In cases where each / G T is a density (e.g., the aforementioned Gaussian case), the 
expression in brackets of equation ([2]) is always zero, and thus unnecessary. However, it is 
often advantageous to let T be a wider class of smooth, nonnegative functions; then the 
bracketed term forces / MLE to be a probability density. 

With local modeling, instead of seeking the single member of the class T to be the 
estimate of fo, the goal is to approximate fo(x) for x near u, yielding the local estimate f u . 
Typically, log /o can be approximated locally by a polynomial; in fact, a linear form for log f u 
usually suffices. See Figure 121 On the left plot, the dashed line gives the logarithm of the 



n I / f(x) dx 



(2) 
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Gaussian density with mean zero and variance one. Local linear estimates log f u are shown 
for each of u G { — 2.5, —1.5, 0, 1.5, 2.5}. It is unimportant that f u (x) is not a good estimate 
of fo(x) for x far from u, since many such local estimates will be found and then smoothed 
together. These local estimates were calculated with a simulated data set consisting of 10,000 
values. The method for finding these local estimates is outlined next. 



In independent work, iLoaderl (119961 ) and iHjort fc Joned (119961 ) localized the likelihood 



criterion of equation Q near u G 1R by writing 

n „ 

C u (f u , X) = ^2 u, A) log f u (Xj) -n K*(x, u, A) f u (x) dx, (4) 

i=i J 

where K*(x,u, A) is a kernel function parametrized by A > 0. A standard choice would 
be K*(x, u, A) = K((x — u)/X) where K is a probability density, but more specific forms 
will be considered (and required) below. The choice of A typically has much more influence 
on the estimator than does the choice of the kernel function. The local estimate /„ is 
found by maximizing X) over logf u belonging to some simple class, usually degree p 

polynomials expanded around u: 

log f u (x) = a u0 + a ul (x - u) H h a up (x - u) p . (5) 

Thus, the model is locally parametric with parameters a u0 , . . . , a up . One imagines repeating 
this procedure at a grid of u- values, call this grid Q, and hence obtaining a family of local 
estimates f = {f u : u G Q}. As a result, f is the family f of local estimates which maximizes 

£(f, X)= X). (6) 

u&Q 

The final local likelihood estimator / LL is constructed by smoothing together the local esti- 
mates: 

f LL (x) = ( V K*{x, u, A) f u {x) I / ( V K*(x, u, A) 1 , (7) 



J2K*(x,u,\)f u (xU /(j2 K 



thus making dual use of A. Returning to Figure [2J the plot on the right shows f LL , the result 
of smoothing together 101 local linear estimates (Q consists of 101 values between -3 and 3). 
In this case, A = 0.05. It is clear that the estimate comes very close to the true density. 

In what follows, simply assume that K* is chosen so that 



Y,K*(x,u,X) = l 

ueG 



(8) 
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for all x and hence 



(9) 



This is a departure from the original approach of iLoaderl (119961 ) and lHjort Jones! (119961 ). 
who instead used f(x) = f x (x). 

The criterion £(f, X) appears awkward upon first sight, but it possesses the following 
property: Considering (X 1; . . . , X n ) again as random variables with unknown density f , then 
(£(f , X)) is maximized by choosing the family f which sets f u (x) = f {x) for all u and all x. 
If that choice were made, the estimate would be f LL = f . Thus, since £(f, X) m (£(f, X)), 
the local estimate log/ u will approximate the degree p Taylor expansion of log/ (x) for 
x around u. The expected value of the standard likelihood criterion is also maximized by 
setting the density equal to the truth, but this localized version has the advantage of allowing 
the choice of A to adjust the amount of smoothness in the estimator. In §U an objective 
method for bandwidth selection is described. There is an apparent conflict between the 
choice of A and the choice of the number of local models (the cardinality of Q) since small Q 
will lead to smooth estimates. In the applications here, Q is chosen large, so that the amount 
of smoothing is completely dictated by A. 



3.2. Density Estimation under Truncation 

Now return to the bivariate density estimation problem using truncated astronomical 
data. The available data are denoted z = (z±,Z2,---, z n ) and M = (Mi, M 2 , . . . , M n ), the 
vectors of redshifts and absolute magnitudes, respectively. Let A denote the region outside 
of which the data are truncated and let A(z, *) = {M: (z, M) £ A} denote the cross-section 
of A at z; A(*,M) is defined similarly. Let <p(z,M) denote the unknown joint density of 
random variables z and M. 

The approach taken here originates in the following naive method. For the moment 
assume z and M are independent so that <f)(z, M) = f(z)g(M) where / is the density 
for redshift and g is the density for absolute magnitude. Clearly, the available data allow 
estimation of the redshift density for observable quasars, denote this density /*. This is 
related to / by 

f%z) = kf h(z,M)dM = kf{z) [ g(M)dM (10) 

JA(z,*) JA(z,*) 

where k is a normalizing constant which forces /* to integrate to one. Assuming for the 
moment that g were known, it is possible to turn an estimator for /* into an estimator for 
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/ by solving equation (TTD1) for /: 



/naiveW cc f*(z) I \J g(M) dMj . (11) 

Starting with an initial guess at g, we could iterate between assuming g is known, and 
estimating /, and vice versa. 

This procedure is portrayed in Figure [31 Using the quasar data set described in £j2j the 
upper left plot depicts .4.(1.5,*) along the vertical axis, with absolute magnitudes ranging 
from -29.9 to -25.85. An (arbitrary) assumption is made regarding the density for absolute 
magnitude (g), shown as the solid curve in the upper right plot. For example, one can find 
that 

/ g(M) dM « 0.24, (12) 

and thus conclude that the observed sample catches 24% of the quasars at z = 1.5. (The 
fact that some quasars are missed within A is considered later when the selection function 
is incorporated into the analysis.) The lower left plot shows how the proportion of quasars 
observed varies with redshift, i.e. it is a graph of 

g(M)dM (13) 

A(z,*) 

versus z. The dashed line in the lower right plot is f*(z), the estimated redshift density for 
observable quasars. The solid curve is /naive, as defined above, found by dividing f*(z) by 
the proportion of quasars observed at redshift z, and then normalizing to force the estimate 
to be a density. 

Figure [3] also illustrates problems with this approach. First, the sharp corner of A 
at z = 3.0 leads to a sharp feature in the estimate /naive- In other words, smooth /* 
does not produce a smooth / NAIV b- Second, consider the behavior of f NA1VE (z) for z where 
Ia(z *) 9{M)dM is small, for instance z > 4.0: Even a small error in the estimate of 
Ia(z *) 9(M)dM will lead to a large error in / NA ive(^)- The fundamental challenge is that 
a well-chosen estimator (i.e., well-chosen smoothing parameters) for /* does not necessarily 
lead to /naive being a good estimator for /. In addition, it is possible to construct exam- 
ples where this iterative approach will converge to different estimates starting from different 
initial values for g. 
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3.3. Local Likelihood Density Estimation with Offset 

Despite the aforementioned problems with the use of /naive, that approach can be 
improved using the local likelihood methods of §3.11 In what follows, /* is estimated using 
local polynomial models which include an additive offset term. This offset is chosen so that 
when subtracted off, what remains is a good estimator for /. The procedure is fundamentally 
the same as that for constructing / NA ive: Starting with an initial guess as to the value of 
the density for absolute magnitude (g), the relationship between /, g, and /* (shown in 
equation (flOl) ) is exploited to construct an estimator for /. (Here it is assumed that <f>(z, M) 
is normalized so that f A <j)(z, M) dz dM = 1, but this choice is arbitrary since the estimate 
can be extended outside of A and then renormalized as appropriate.) 

To start, rewrite equation (ITU1) as 

log f*(z) = log (*/(*)) + logf [ g{M)dM) , (14) 

\JA(z,*) J 

where k is the constant required to force f A <j)(z,M) dz dM = 1. Consider the goal of 
estimating f(x) for x near u. Ideally, it would be possible to fit a local model 

log (kf u (x)) = a u0 + a ul (z - u) H h a up (z - uf (15) 

to obtain both the local estimate f u and the needed normalizing constant k, but truncation 
does not allow for direct estimation of /. Instead, write a local version of equation ffl"4|) as 



log/ u *(^)=log(fc/ u (^)) + log / g(M)dM). (16) 

\Ja(z,*) J 

and then substitute in the expression for \og{kf u ) from equation (|T5|) into equation ffTB]) to 
get 

\ogf* u {z) = a u0 + a ul (z -u) + --- + a up (z - uf + log ( / g(M) dM ) . (17) 

\JA(z,*) J 

Of course, it is possible to estimate /* with the available data and equation (TIT)) makes it 
clear that a good way of doing this would be to fit a local polynomial model with 



log(oFFSETf) = log I / g[M)dM 

\Ja(z,*) 



A(z,*) 

included as an offset. (Recall that, for the moment, g is assumed known.) In other words, 
instead of maximizing the local likelihood criterion C u (f*, z) over log/* that are polynomials 
expanded around u (as in equation (jSJ)), maximize over functions of the form 

a u0 + a ul (z — it) H h a up (z - uf + log(oFFSET f ) . (19) 
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Write C u (kf u x OFFSETf, z) as the local likelihood at u when the offset is included. 

Label the parameters which maximize C u (kf u x offset^ z) as a u0 , . . . ,a up . Comparing 
equations ffTBT) and ffTTl) . note that 

a u0 + a ul (z - u) H h a up (z - uf (20) 

is an estimate of \og(kf(z)) and hence 

exp(a u0 + a ul (z - u) H h a up (,2 - u) p ) (21) 

is the local (near u) estimate of kf(z). As before, this is repeated for a grid of values u E Q 
and the result is the family f which maximizes 

£(f X OFFSETf, z) = C u (kf u X OFFSETf, z) , (22) 

and the estimate of kf is found by smoothing together these local estimates: 

exp(a u0 + a ul (z - u) H h a up (2: - uf ) . (23) 

Here, it is stressed that estimates of kf are smoothed together, instead of estimates of /*. 
This is important because now A can be chosen to obtain the optimal amount of smoothing 
for the best estimate of kf. This avoids the problems which were evident at z — 3.0 in 
Figure [3j A method for choosing A is described in £JH Also, the constant k is present in all 
of these estimates, but it will turn out in the next step that this is exactly what we need: 
There is no need to renormalize and get separate estimates of / and k. 

In this next step, g will be estimated holding kf fixed at its current estimate. To ease 
notation, define 

a u (z) = a u0 + a ul (z - u) H h a up (z - uf . (24) 

With an estimate of kf in hand, now let g* denote the density for the observable M so that 
since 

g*(M) = kg(M)[ f(z)dz (25) 
Ja(*,m) 

it follows that 

log£*(M) = log<?(M) + log (k [ f{z) dz) . (26) 

V JA{*,M) J 

Now consider local models of the form 

log^(M) = b v {M) + log (k [ f(z) dz) (27) 

V Ja(*,m) / 
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where 
and now 



b v {M) = b v0 + b m {M -«) + ...+ b vp {M - v 



log k 



(28) 
(29) 



f(z) dz 

'A(*,M) 

is the logarithm of the offset; note that an estimator for this was found above in equation 
fl23D: 



OFFSET 



g 



A(*,M) 



^ K*(z, u, A) exp(a„(z)) 



ueg 



dz. 



This leaves 



(30) 



(31) 



veQ 



as an estimator for g. This is then used to reestimate the offset term used in equation ( TT71) . 
and the process repeats. This is conceptually the same procedure as was used to create 
/naive above, since the estimate of h is found by alternating estimating / and g. 



3.4. The Global Criterion 

This section will tie together the ideas of the previous. The iterative procedure described 
above is computationally tractable, and has intuitive appeal. Remarkably, it is also possible 
to pose the estimation problem in another manner which is not as computationally useful, 
but will lead to analytical results. Define 

£*(f,g,z,M) = ^i^iTO^AK^-) + J2 K *( M J,v,\)b v (M j ) 
j=i ^ u&g vgq 



A 



K*(z, u, A) exp(a u (z)) 

ueg 



dM dz )■ (32) 



as the global criterion. It is a function of both families of local models, f and g. The key 
is to notice that if g is held fixed at its current estimate g, maximizing C*(f , g, z, M) over 
local models f is identical to maximizing £(f x offsety , z) with fixed estimate of the offset 
term. To see this, recall equation ([TBI and note that an estimator for OFFSETf is 



OFFSETf 



ViY""(M,w,A)exp(b t ,(M)) dM 



(33) 
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and from equations (1221 and (@J, 

£(f X OFFSETf, z) = C u (kf u X OFFSETf, z) 



«e6 



tf* w, A) log (kf u (zj)) 



n/ K*(z,u,X)kf u (z) 



j=i [ueg 



(34) 
(35) 
cte (36) 
(37) 



A 



^2 K *i z , u i A ) exp(a^(z)) 
.ueg 



J2K*(M,v,X)exp(h v (M)) 

.veg 



fc" + £*(f,g,z,M) 



dz dM \ (38) 
(39) 



where k' and k" are constants which do not depend on f , and z and z are the lower and 
upper bounds on redshift, respectively. An analogous statement could be made for finding 
g when f is held fixed. Thus, the iterative search method described in §3.21 is equivalent to 
maximizing this global criterion. 



3.5. Including Dependence and the Selection Function 

Until now, the derivation of the approach has assumed that random variables z and 
M are independent. Dependence will be incorporated by including a parametric portion 
h(z, M, 9) so that the assumption becomes that 



log 0(^, M) = f(z) + g(M) + h(z, M, 9) . 



(40) 



A restriction placed on h is that it must be linear in the real- valued parameters 9. In the 
absence of a physically-motivated model, a useful first-order approximation is h(z, M, 9) = 
9zM. The global criterion of equation fl32l is naturally updated to 



£*(f,g,z,M,, 



i=i 



. ueg 



veg 



1 



exj>(h(z,M,B)) 



J2K*(M,v,X)exp(b v (M)) 



veg 



^2 K *i z , u i A ) exp(a u (z)) 



ueg 



dMdz) . (41) 
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Note that with this form, when f and g are held constant, maximizing C* over 6 is equivalent 
to finding the maximum likelihood estimate of 9. Note also that the sum over the n data 
pairs has also been updated to allow specification of a weight Wj > 0. In this case, the natural 
choice for the weight is the inverse of the selection function for that data pair. The intuition 
is that a pair with selection function of 0.5 is "like" two observations at that location. 

Finally, with a criterion of this form, this estimator can be fit into a general class of 
statistical procedures called M-estimators. See the Appendix ( §fA"|) for an overview of M- 
estimators. 



The described procedure returns an estimate normalized to be a probability density 
over the observable region A. Of course, it could be renormalized to meet the goals of the 
analysis, but care should be taken if the renormalization involves multiplying by a constant 
which is itself estimated from the data. In certain cases, namely when there is a small 
sample, this could result in significantly understated standard errors. Luminosity curves 
are usually stated in units of Mpc~ 3 mag _1 , and are obtained by multiplying the bivariate 
density (normalized to be a probability density over .4.) by a redshift-dependent constant; 
thus no adjustment of the standard errors is needed in this case. 



The choice of the bandwidth A (the smoothing parameter) is critical. Choosing A too 
large results in an oversmoothed, highly biased estimator; choosing A too small leads to 
a rough, highly variable estimator. This is the bias/variance tradeoff. Fortunately, it is 
possible to select A to balance these two in a meaningful, objective manner. 

Although this discussion applies in general to the problem of density estimation, here 
it will be described in terms of estimating the bivariate density <fi over A. Let <p\ denote a 
general estimator for which is a function of a smoothing parameter A. Then, 



3.6. 



Normalization of the Estimate 



4. 



Bandwidth Selection 




J A . 




Variance(0 A (2,M)) + Bias 2 (0 A (z, M) ) dzdM (42) 
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is the integrated mean squared error for <p\. IMSE is a natural measure of the error in the 
estimator, and it is apparent from equation fj42|) how it balances the bias and variance of 
the estimator. 

Although IMSE cannot be calculated, there is an unbiased estimator. It holds that 



f (($ x {z,M)-<f){z,M)J 2 ) dzdM 



(j) 2 x (z,M) dz dM 



A 



) 



-2 \ J cj> x (z, M) <f)(z, M)dzdM/ + k 



where k is a constant which does not depend on A, so it can be ignored. Let (j>X(-j){zj, Mj) 
denote the estimate of t he density at ( zj,Mj) found using the data set with this j th data 
pair removed. Following iRudemol (Il982h . 



<j>x(z,M) <j>{z,M) dz dM 



A 



) 



(43) 



so that the least-squares cross-validation score (LSCV), 



» n 

LSCV(A) = / ft x {z,M)dzdM-2n- y y\> 



(44) 



is an unbiased estimator fo r IMSE((ft \) — fc, and hence minimizing; it over A approximates 
minimizing the IMSE. See lHalll (119831 ) and IStonei (119841 ) for theoretical results showing the 
large-sample optimality of choosing smoothing parameters to minimize this criterion. 

Figure H] gives an example of bandwidth selection by minimizing LSCV. Here, 100 
simulated values are taken from the Gaussian distribution with mean zero and variance 
one. The left plot shows how LSCV varies with the choice of bandwidth, and leads to a 
choice of A opt = 1.25. The right plot compares the density estimate using three bandwidths 
(A opt , Aopt/3, 3A opt ) with the true density. With the bandwidth too small, there are nonsmooth 
features, and the bias is low but the variance is high. With the bandwidth too large, the 
estimate is smoothing out the prominent peak in the center. Here, the variance of the 
estimate is low, but the bias is high. The optimal choice gives an estimate close to the truth, 
and is found using a bandwidth which balances estimates of the bias and variance. 

The weighting due to the selection function needs to be taken into account in the 
previous discussion. Recall that the weight Wj is conceptualized as the number of equivalent 
observations represented by this data pair. Thus "leaving out" observation j is achieved by 
reducing its weight from Wj to Wj — 1 in the criterion (equation I4TI) . But one must imagine 
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repeating this Wj times (for each equivalent observation which observation j represents). Let 
n ctt = Yl w j denote the effective sample size. The new relationship is 



(nj± 



Wj<l>M-i){zj,Mj) 



i=i 



4>x(z, M)4>(z, M) dz dM 



(45) 



A 



where 



J H-i) \ z j 



Zj,Mj) now indicates the estimator evaluated at (zj,Mj) when the weight on 



observation j is reduced from Wj to Wj — 1. 

Dir e ct cal culation of the leave-one-out estimates would be computationally intractable. 



S chafer! (120061 ) describes an approximation based on the second-order Taylor expansion of 



the criterion function. This approximation proves to be highly accurate and computationally 
simple. 



4.1. Variable Bandwidths 



The method described in §3.21 involves fitting local polynomial models at each of a grid 
of values u G Q, for both the z and M directions. These derivations were all performed 
assuming fixed bandwidth A used for each of these models. This was merely for notational 
convenience; there is no reason that different bandwidths could not be chosen for each of 
these local models. In fact, given that the variables are on different scales, it would be 
unreasonable to assume the same bandwidth would be a good choice for each. In the results 
given in the next section, a stated bandwidth is assumed to be on the scale of the variables 
after they have been transformed to lie in the unit interval, and bandwidths are given as 
(A«, Am) pairs. Allowing the bandwidth to further vary over the different local models gives 
the overall model fit much flexibility, and LSCV can be minimized as before. A full search 
over this high- dimensional space is not feasible in practice, however. 



5. Results 



This section describes the results of the application of this method to some real and 
simulated data sets. In all cases, linear models are fit when doing the local likelihood 
modeling (p = 1), and Q is a grid of 100 evenly spaced values in both the z and M dimensions. 
The parametric portion is set as h(z, M, 9) = QzM. Bandwidths (A z ,Aa/) are stated as 
proportions of the range for that variable, e.g. \ z = 0.05 means that the bandwidth for the 
local models for redshift cover 5% of the range 0.1 < z < 5.3. 
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5.1. Analysis of SDSS Quasar Sample 

This method was applied to the sample of quasars described in £j2j As stated above, 
the method is capable of incorporating the selection function via differential weighting in 
the criterion (equation (j4ip ). but the selection function does present some challenges in this 
case. For quasars with z « 2.7 the selection function drops as low as 0.04 due to difficulty 
in distinguishing quasars from stars of spectral type A and F. This gives a weight of 25 to 
these quasars, whi ch would be fine if it were exact, but these weights are calculated based 



on simulations and iRichards et al.1 (120061 ) states that the selection function in this region "is 
quite sensitive to such uncertain details of the simulation." They limit the weight on any 
observation to 3.0 to account for this. This limit was also imposed in the analysis here. 

Figure [5] shows how LSCV varies with X z and Xm- The criterion is minimized when 
X z = 0.05 and Xm = 0.17. The grid of values at which LSCV is calculated is spaced by 
0.01 because, as will be seen below, fluctuations of the bandwidths on this scale lead to very 
little change in the estimates. The minimum value is -0.0078262, but no significance can 
be attached to this value, since LSCV is not an unbiased estimate of IMSE, but instead of 
IMSE plus an unknown additive constant. 

Figure El shows, using the solid contours, the estimate of the quasar density (two- 
dimensional luminosity function) as a function of z and M, when X z = 0.05 and Xm = 0.17. 
This estimate is normalized to integrate to one over the entire dashed (observable) region. 
Recall from §3.31 that this is the form which the algorithm provides. Fortunately, this is 
the ideal form for the estimate. The (effective) count of quasars in the surveyed region is 
n cS = 16858.51 and the survey covers 1622 deg 2 . Thus, the quasar count in a region R of 
(z, M) space can be estimated using 

,2' 



(180/tt) 
1622 



<f>(z, M) dz dM 



UR 



(46) 



The estimate of 9 is —0.41, with a standard error of 0.03. Although it is not possible to 
assign physical significance to this value for 9, it is clear that the possibility that 9 = is 
ruled out, and hence there is very strong evidence for evolution of the luminosity function 
with redshift. 

This estimate has an apparent irregularity in the shape of the density estimate for 
z ~ 3.5. (Note the "bumps" in the solid contours for all values of M at z ~ 3.5.) Quasars of 
this redshift are given larger weight due to interference from stars of spectral type G and K. 
Although it is not possible to be certain, it appears that the weighting may not be sufficiently 
accurate for the quasars. The weights may be underestimated leading to a corresponding 
dip in the density estimate. The bandwidth (A z ) is sufficiently small to pick up this artifact. 
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In fact, LSCV forces the bandwidth to be small enough so it can model this feature. It is 
hoped that in future work the uncertainty in the weights can be incorporated into LSCV. 
For comparison, another estimate was constructed using \ z = 0.15 for local models centered 
on redshift values larger than 2.0, while still using X z = 0.05 for z < 2.0. This estimate is 
shown as the dotted contours. The increased smoothing removes the artifact. 

Figure [7] shows the estimated count of quasars with M < —23.075 as a function of 
redshift. As in Figure El the solid curve is the estimate with the LSCV-optimal bandwidths, 
and the dashed estimate is found using the increased smoothing. Figure [H] shows quasar 
counts as a function of absolute magnitude at a collecti on of redshift values. Comparisons are 



made with the estimates given inlRichards et all (120061 ) which were found using the bin-based 



method of iPage &: Carreral (120001 ) . The error bars in both Figures [7] and [8] are one standard 
error, but represent statistical errors only. The error bars do not account for incorrect 
specification for the parametric form h. But, if there is bias from the incorrect specification of 
h, the binned estimates must share these biases. This would be surprising since, while having 
higher variance, estimates constructed from binning do not make assumptions regarding 
the evolution of the luminosity function, and hence a well-constructed estimate should be 
approximately unbiased. 

Figure [8] also provides insight into the sensitivity of the estimate to the bandwidth 
choice. It would be of great concern if small changes in bandwidth led to significant changes 
in the estimate. To explore this, eight additional estimates were constructed using every 
possible combination of X z 6 {0.04,0.05,0.06} and Am G {0.16,0.17,0.18}. The results are 
shown as gray curves in each plot of Figure [BJ but are only visible at M > —25 and z > 3.75. 
The fluctuations are small relative to the size of the error bars. Clearly, the estimates are 
insensitive to these perturbations. 



5.2. Simulation Results 

Simulations were performed to further explore the behavior of the estimator. For these, 
the estimate shown in the dotted contours in Figure M is taken to be the true bivariate 
density; the truncation region is unchanged. The idea is to ask the following: If the truth 
were, in fact, the estimate found here, would this method be able to reach a good estimate 
of the density under identical conditions (same sample size and truncation region)? Hence, 
the new data sets were simulated consisting of 16,589 (z, M) pairs within the observable 
region. The first of these data sets was utilized to find the optimal smoothing parameters; 
these were found to be A 2 = 0.06 and Xm = 0.16. Each of the other 19 data sets was 
analyzed using these values, so that these simulations also provide insight into the adequacy 
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of this approach to bandwidth selection. Figure [9] shows the results from the simulations 
by comparing estimates of the cross-sections of the estimates <fi at four different redshifts. 
Each dashed curve is an estimate from one of the 20 data sets. The solid curve is the truth. 
These results show strong agreement between the estimates and the truth over the regions 
where data are observed. There is some bias in the tails, but this is in regions far from any 
observable data. In addition, these simulations provide strong evidence that the estimates 
of the standard errors are accurate: The variability in the estimates is comparable to the 
size of the error bars. 



6. Summary 

The semiparametric method described here is a strong alternative to previous approaches 
to estimating luminosity functions. The primary advantage is that it allows one to estimate 
the evolution of the luminosity function with redshift without assuming a strict parametric 
form for the bivariate density. Instead, one only needs to specify the parametric form for a 
term which models the dependence between redshift and absolute magnitude. Future work 
will focus on specifying a physically-motivated form for this parametr ic portion, but t he re- 



sults from the analysis of a sample of quasars reproduce well those from lRichards et all (120061 ) 
while only assuming a simple, first-order approximation to the dependence. Other portions 
of the bivariate density are modeled nonparametrically, and are functions of smoothing pa- 
rameters. Using least-squares cross-validation, these smoothing parameters can be chosen in 
an objective manner, by minimizing a quantity which is a good approximation to the inte- 
grated mean squared error. Results from simulations show that, with a data set of this size, 
the method is indeed capable of recapturing the true luminosity curves under the truncation 
observed in these cosmological data sets. 
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The procedure described in $3] can be fit into a general class of statistical estimators 
called M- estimators. In the simplest case, a M-estimator for a parameter is constructed by 
maximizing a criterion of the form 



where (Xi,X 2 , ■ ■ ■ ,X n ) are the observed data, assumed to be realizations of independent, 
identically distributed random variables and (3 is the parameter to be estimated. The func- 
tion ip is some criterion. For example, in the case of finding the maximum likelihood estimate 
of /3, the function (p((3,x) = \ogf (x), where f is the density corresponding to parameter 



This preprint was prepared with the AAS IATgX macros v5.2. 
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7. Appendix 



A. M-estimators 




(Al) 
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(3. Most least squares problems can be stated as M-estimators. Standard theory for M- 
estimators can be applied to obtain an approximation to the distribution of (3 M , which can 
then be used to find standard errors and form confidence intervals. 

In the case at hand, Xj is the pair (zj, Mj), (3 = (f , g, 9), and 
(p((3,Xj) = ^K^Zj^X)^) +J2 K *( M J, v , X ) h v(M j ) + h(z j ,M j ,e) 



u&g 

exp(h(z,M,6)) 



veg 



J2K\M,v,X)ex P (h v (M)) 



^2k*(z,u,X) exp(a u (z)) 
.ueg 



dM diA2) 



See Schafer (2006) to see the derivations of the approximate distribution for the estimator 
in this case. 



The M-estimator could be generalized to the following 

3 



, MW = argmax 
p ee 



(A3) 



where Wj > is the weight given to the j observation. This allows for easy incorporation of 
the selection function into the analysis. The statistical theory for this weighted M-estimator 
is a simple extension of that for the standard M-estimator. 
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Redshift 



Fig. 1. — Quasar data from the Sloan Digital Sky Survey, the sample from iRichards et al. 



(120061 ) . Quasars within the dashed region are used in this analysis. The removed quasars 
are those with M < —23.075, which fall into the irregularly-shaped corner at the lower left 
of the plot, and those with z < 3 and apparent magnitude greater than 19.1, which fall into 
a very sparsely sampled region. 
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Fig. 2. — An illustration of local likelihood density estimation. The dashed line in both plots 
is the logarithm of the Gaussian density with mean zero and variance one (/o in the notation 
of §3.11) . In the left plot, depicted are local linear estimates (/„) of the density for each of 
u E {—2.5, —1.5,0, 1.5,2.5}. A simulated data set consisting of 10,000 values is utilized. In 
fact, local linear estimates are found for 101 values of u equally spaced between -3 and 3. 
These local estimates are smoothed together to get the final estimate (/ L l) shown in the 
right plot. 
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Fig. 3. — An explanation of the naive, but motivating idea. The left plot in the first row 
depicts the cross-section of the observable region at z — 1.5 (denoted .4.(1.5, *)) with absolute 
magnitudes ranging from -29.9 to -25.85. In the right plot, the solid curve is an assumed 
density for absolute magnitude. 24% of the area under this curve falls in .4(1.5, *), thus one 
would assume that the observed sample catches 24% of the quasars at redshift z = 1.5. (For 
now, ignore selection effects.) In the second row, the left plot shows how this proportion 
observed varies with redshift. The dashed line in the right plot is the estimated density for 
observable quasars (/*), i.e. the estimate ignoring truncation. The solid curve is /naive, 
which equals /* divided by the curve on the left and then rescaling to make it a density. 
Note that the estimate at z — 1.5 actually decreases after this adjustment because quasars 
are relatively well-observed at that redshift. Note how the sharp feature in the observable 
region at z — 3.0 creates both the increase in proportion observed and the steep drop of 
/naive at that redshift. 
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Fig. 4. — An illustration of bandwidth selection by minimizing LSCV. The true density 
is the Gaussian with mean zero and variance one, and a sample of size 100 is used in the 
estimation. The chosen bandwidth is 1.25. The plot on the right shows how the optimal 
bandwidth yields an estimate (dashed line) near to the truth (solid line), while choosing the 
bandwidth too small (dash/dot line) or too large (dotted line) leads to poor estimates. 
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Fig. 5. — LSCV as a function of A 2 and X M for the analysis of the quasar data. Each 
dot represents a (X z , X M ) combination for which LSCV was calculated. The criterion is 
minimized when \ z = 0.05 and Am = 0.17. 
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Fig. 6. — Estimates of the bivariate density. The contours are lines of constant density, 
with the estimate normalized to integrate to one over the observable (dashed) region. Thus, 
it is possible to estimate the number of quasars in a particular subset of (z, M)-space by 
integrating this function over that subset, multiplying by the observed count, and then 
dividing by the fraction of the sky covered by this survey. The solid contours are found 
using \ z = 0.05 and \ M = 0.17, which were the values of that minimized LSCV. Note the 
irregularity in the estimate at z ~ 3.5. This can be traced to similar fluctuations in the 
selection function. Another estimate was obtained by keeping \ z = 0.05 for z < 2.0, but 
using A 2 = 0.15 for z > 2.0, and is shown as the dotted contours. Using the larger bandwidth 
smooths out some of these artifacts. 
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Fig. 7. — Estimates of the luminosity function as a function of redshift, integrated over 
absolute magnitudes less than -23.075. The solid curve is the estimate using \ z = 0.05 and 
Am = 0.17. The depicted error bars are for this estimate and represent one standard error; 
these account for statistical errors only. The dashed curve is the smoother estimate found 
by keeping \ z = 0.05 for z < 2.0, but using \ z = 0.15 for z > 2.0. 
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Fig. 8. — Estimates of the l uminosity funct i on at different redshifts (dark solid lines and error bars), 
compared with estimates from Richards et al. ( 2006h (light solid lines and error bars). These are cross- 
sections of the estimate shown in Figure using \ z = 0.05 and Am = 0.17 (the solid contours). Error bars 
represent one standard error and account for statistical errors only. Eight additional estimates were found 
by perturbing A z and Am by ±0.01. These estimates are shown as the gray curves (only visible at M > —25 
and z > 3.75). 
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Fig. 9. — Results from simulations. The solid curve is the truth, and the dashed curves are 
the estimates from each of the 20 simulations. The error bars are one standard error, and 
found by averaging (in quadrature) the error bars over the 20 simulations. 



